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Constraints on a 8 from galaxy clustering in TV-body simulations and 
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ABSTRACT 

We generate mock galaxy catalogues for a grid of different cosmologies, using rescaled N- 
body simulations in tandem with a semi-analytic model run using consistent parameters. Be- 
cause we predict the galaxy bias, rather than fitting it as a nuisance parameter, we obtain an 
almost pure constraint on as by comparing the projected two-point correlation function we 
obtain to that from the SDSS. A systematic error arises because different semi-analytic mod- 
elling assumptions allow us to fit the r-band luminosity function equally well. Combining our 
estimate of the error from this source with the statistical error, we find as — 0.97 ± 0.06. We 
obtain consistent results if we use galaxy samples with a different magnitude threshold, or if 
we select galaxies by 6j-band rather than r-band luminosity and compare to data from the 
2dFGRS. Our estimate for as is higher than that obtained for other analyses of galaxy data 
alone, and we attempt to find the source of this difference. We note that in any case, galaxy 
clustering data provide a very stringent constraint on galaxy formation models. 
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1 INTRODUCTION 

For a given set of cosmological parameters in ACDM, the clus- 
tering of dark matter c an be studied very a ccurately through N- 
body simulations (e.g., ISpringel etal] r2005). or for that matter, 
throu gh analytic models calibrated by simulations (e.g. JSmith et al.l 
120031) . The clustering of dark matter is not usually observed di- 
rectly, however, though weak lensing shear-shear correlations can 
provide (at present noisy) estimates. Redshift s urveys may fur - 
nish us with galaxy clustering statistics (see, e.g.. |Peacockll2003l) . 
while weak lensing measurements, for example, norm ally probe 
the cr oss-correlation between galaxies and dark matter JRefregien 
120031 and references therein). 

Galaxy clustering statistics derive a great deal of their power 
to constrain cosmological parameters by constraining the scale at 
which the power spectrum 'turns over' on large scales, which com- 
plements the high-redshift CMB constraint on this scale rather well. 
The baryonic features in the correlation functio n or power spec- 
trum add to the effectiveness of the constraint (Coleetal. 2005; 
lEisenstein et al.l [2005h . The scales used in these joint constraints 
tend to be large scales, where the evolution of clustering is still in 
the linear regime or where deviations from linearity can be more 
readily modelled. Moreover, in this regime the galaxy correlation 
function is expected, in the absence of non-local effects, to have 
the same shape as the ma ss correlatio n function, though offset by a 
constant factor (see, e.g., Coles 1993). This offset - the (square of 
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the) bias - depends on the galaxy population under consideration; 
it depends, for example, on the threshold luminosity of the sample. 
Because of this uncertainty, when the galaxy correlation function is 
used to constrain cosmology, information on its overall normaliza- 
tion is not normally used, and the constraints come entirely from 
its shape. 

In this paper we generate synthetic galaxy clustering statistics 
by painting galaxies from a semi-analytic model onto dark matter 
distributions given by iV-body simulations. We then compare these 
clustering statistics with those from the SDSS to attempt to con- 
strain cosmology. We can see the possibility for various benefits 
from our approach. Firstly, because we attempt to generate realistic 
catalogues with full galaxy properties, we can make a prediction 
for the bias factor of a given galaxy sample and hence use the over- 
all normalization of the correlation function in our cosmological 
constraints. In particular, we may be able to constrain as, which 
is not possible for normal techniques employing galaxy clustering. 
Secondly, because we populate the simulations on a halo-by-halo 
basis rather than just assuming that galaxies approximately trace 
mass on large scales, we generate a theoretical prediction for the 
small-scale, nonlinear clustering. We can therefore attempt to use 
this information in our cosmological constraints too. 

Our constraints are largely independent from those using the 
CMB, and involve different assumptions (though we consider only 
flat models, which one could regard as implicitly using CMB 
results). Because dark energy has an effect on structure forma- 
tion, and different forms of dark energy might affect it in dif- 
ferent ways at late times, it is useful to have an independent, 
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low-redshift constraint on er g that does not rely on a joint anal- 
ysis with high-redshift data i Doran, Schwindt & Wetterichll200ll ; 



iBartelmann. Doran & Wetterichl20Qq) . A joint analysis would tend 
to be more model-dependent as one must be able to model what 
happens in the gap between observed snapshots of the Universe. 



2 CONSTRUCTING GALAXY CATALOGUES 
2.1 Simulations 

In choosing parameters for our simulations, our aim was to gener- 
ate simulation outputs for a range of different values of flyi and erg 
so that we could examine galaxy clustering as a function of these 
quantities. Given our focus on constraining erg, we opted to gener- 
ate outputs with erg taking values between 0.65 and 1.05, regularly 
spaced in steps of 0.05. 

Measurements of the abundance of clusters constrain the high- 
mass end of the halo mass function, and hence cons train a combi- 
nation of Qm and erg (e.g. jEke. Cole & Frenk|[l99r3) . This combi- 
nation is, very approximately, erg Q M 5 . To test if we could break this 
degeneracy, we have generated two grids of models. For Grid 1, the 
parameters of each model satisfy ergf2 M 5 = 0.8(0.3) 0,5 , while for 
Grid 2 they satisfy a 8 Q M 5 = 0.9(0.3) ' 5 . Within each grid, erg 
takes on its full range of values between 0.65 and 1.05. It would be 
very difficult to distinguish between two cosmologies lying on the 
same grid using cluster abundances. The two 'cluster normaliza- 
tion' curves, with crgfi M 5 = const., are shown as the long-dashed 
and short-dashed lines in Fig. [TJ The pairs (Qm, eg) labelling the 
cosmologies we analyse are plotted as crosses on these curves. 

We extract the mass distribution for these cosmologies from 
two simulations run using the TREE-PM jV-body code GADGET2 
( Sprin gel, Yoshida & Whitel200iMSpringell2005h . Each simulation 
follows the evolution of 512 3 particles in a 300 h,- 1 Mpcbox. We 
have stored the simulation output at several redshifts. Each of these 
snapshots of the mass distribution is then interpreted as a z = 0.1 
snapshot of a simulation with a different cosmology, to avoid hav- 
ing to run a great number of simulations. We choose z — 0.1 
since this is near the median redshift of the main SDSS and 2dF- 
GRS galaxy samples. The output redshifts are chosen so that once 
the simulations are relabelled as 2 = 0.1 snapshots, the value of 
erg at z — for each simulation falls onto a regular grid. Each 
simulation then gives us snapshots with erg taking values between 
0.65 and 1.05, regularly spaced by 0.05. Table [TJ gives the value 
of Qm and erg in these relabelled snapshots. We have chosen the 
simulation parameters such that the first simulation, 'Run 1', has 
Qm = 0.3 at its erg = 0.8 output, while the second simulation, 
'Run 2', has Qm = 0.3 at its erg = 0.9 output. When we perform 
a further rescaling of Qm (see below) it is these central snapshots 
which remain unchanged. The initial conditions are calculated us- 
ma a lBardeen etafl (1986) power spectrum with shape parameter 
r = 0.14 and with primordial spectral index n s = 1. A smooth 
power spectrum was most convenient in the light of the rescalings 
we carry out on the final output, but in fact the lBardeen et al.l il986t) 
power spec trum with T = 0.14 was f ound to be a good fit to the 
CMBFAST tSeljak & Zaldarriaga 1996) spectrum with Q h = 0.045 
used for the Millennium Simulation ( Springel et al. 2005), the pa- 
rameters of which we re chosen to be in agreement with the one- 
year WMAP results dSpergel et al]|2003h . 

Once we have relabelled the simulation snapshots as z = 0.1 
snapshots, they lie on a curve in (Qm,<t&) space which reflects 
the way the dark matter density is reduced and the amplitude of 
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Figure 1. The position in the (Cm, erg) plane of the outputs of our two 
simulations. The line connecting the outputs shows how the values of SIm 
and erg change as the simulation evolves, where we track the instantaneous 
values of these parameters rather than the values they will have at the fi- 
nal time, which would be the more conventional labelling (and would, of 
course, not change during the course of the simulation). The solid line cor- 
responds to Run 1 and the dotted line corresponds to Run 2. Low redshift 
outputs (lower density, more clustered) are in the top left, while high red- 
shift outputs (higher density, less clustered) are in the bottom right. Also 
shown are the curves described by the cluster normalization condition that 
crgf2 M 5 = const, for two different values of this constant. We rescale 
simulation outputs so that they lie on these curves. The rescaling is shown 
schematically by the red arrows. 



clustering is increased as the simulation evolves. These curves are 
shown as the solid and dotted lines in Fig.Q] We rescale Qm in each 
snapshot, so that instead the snapshots lie on one of the cluster- 
normalized curves described abov e. The rescaling is achieved in 
practice by applying the results of IZheng et al.1 d2002h . If the par- 
ticle mass is scaled in the obvious way to obtain the desired Qm, 
the particle velocities must also be scaled to compensate, else the 
haloes no longer satisfy the virial relation between their kinetic 
and potential energy, and the galaxy populations of haloes are eas- 
ily distinguished in the different cosmologies via dynamical ob- 
servables. The rescalings in Qm which move a snapshot onto the 
cluster-normalization curve are shown schematically as red arrows 
in Fig.Q] Each cluster-normalized grid contains rescaled snapshots 
from both simulations, and the simulation parameters were chosen 
so that the rescalings would never have to be too large. For some of 
the cosmologies on our grid, we could choose to rescale from ei- 
ther of our simulation runs without having to change Qm by a large 
factor. We have used these cosmologies to test that the results using 
either simulation run are consistent, and hence that our rescaling 

works as expected. 

The simulation code runs SUB FIND dSpringel et ail 1200 ll) 
on the fly, provid ing us with a list of friends-of-friends haloes 
dPavis etalJI 19851) of more than 20 particles, and their substruc- 
tures. We use a linking length of 0.2 times the mean inter-particle 
separation in the friends-of-friends algorithm to identify the haloes. 
SUBFIND also allows us to identify the particle in the halo with 
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Table 1. Cosmological parameters of simulation outputs after having been 
relabelled as z = 0. 1 outputs. We follow the usual convention that these are 
the parameter values the simulation would have if evolved further to z = 0. 
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0.120 


1.050 






0.173 


1.000 


0.102 


0.950 


0.234 


0.950 


0.159 


0.900 


0.300 


0.900 


0.226 


0.850 


0.370 


0.850 


0.300 


0.800 


0.442 


0.800 


0.379 


0.750 


0.514 


0.750 


0.460 


0.700 


0.585 


0.700 


0.541 


0.650 


0.654 


0.650 



the least gravitational potential energy, which we use in our galaxy 
placement scheme. 



2.2 Semi-analytic model 

The properties of galaxies in our catalogues are ge nerated using the 
semi-analytic galaxy formation code, GALFO RM dColeetal .1 2000; 
iBenson et ai]|2002ll2003l:lBaugh et alj|2005h . For the purposes of 
this paper, we may consider a semi-analytic model as being a means 
of predicting, given some dark matter halo at a redshift of interest, 
the galaxy population of that halo. Having that information, we can 
construct galaxy luminosity functions, correlation functions, etc. 
that might be considered the results or predictions of the model. 

The first step in predicting the galaxy population of a halo is 
calculating the merger history of the halo. In simulations of suffi- 
ciently high resolution and with a sufficiently large number of out- 
puts, this can be extracted from the iV-bo dy data. In the ca se of 
GALFORM, this has been done recently by Bowe r et al. i mi) with 
the Millennium Simulation (Springel et al. 2005); the same simula- 
tion h as also been used by Crot on et alj d2006f ) and lDe Lucia et al.l 
d2006l) to generate catalogues using a different semi-analytic code. 
The simulations we describe above, by contrast, do not have suf- 
ficient resolution for us to extract reliable merger trees for the 
ha loes of interes t. A M onte Carlo scheme based on the work 
of Lacev & Cold \ 19931) and using the algorithm described by 
IColeetal.1 12000) is employed instead, therefore. This generates a 
merger tree for a halo based only on the halo mass, the cosmology 
and the initial power spectrum, and does not use other data from 
the simulation. This scheme does not, therefore, provide galaxy 
positions; our method of placing galaxies is given, instead, in Sec- 
tionl23l 

Unfortunately, the statistical properties of merger histories 
generated by this algorithm are not identica l to histories ex - 
tracted directly from an TV-bo dy simulation I Cole et alj 20071) . 
Parki nson. Cole & Hellvl J2007l) and lNeistein & Dekell j2007^ have 
devised empirically motivated modifications to the algorithm to al- 
low Monte Carlo trees to fit the simulation data better. A detailed 
analysis of the effect of such a modification on semi-analytic galaxy 
properties is beyond the scope of this paper. We hav e, though, tested 
some of our results using the new algorithm of IParkinson et al] 
d2007h . and find that for our purposes the new trees make little dif- 
ference. 

Given the merger history of a halo, the model computes the 
evolution of the baryonic content of the halo using a variety of an- 
alytic prescriptions. Many of the equations governing the physical 
processes modelled by GALFORM contain parameters which may 



be adjusted. Some of these (for example, the form factor /orbit / c 
which governs the size of merger remnants) have a 'natural' value 
determined by the physics; others (those governing the angular mo- 
mentum distribution of infalling haloes, say) are derived by com- 
parison to more detailed simulations. The function of allowing 
these parameters to change, then, is to allow investigation into the 
magnitude of the effect of different physical processes on the re- 
sulting galaxy properties in the model. Other parameters have no 
natural value, and can only be fixed by requiring that they take val- 
ues which allow the model to fit observations. Much of the time, if 
we are able to fit some set of observations satisfactorily by choos- 
ing the parameters of the model judiciously, the same set of ob- 
servations could also be fit reasonably well by some very different 
choice of parameters. Therefore, within the GALFORM framework, 
we have different models using different physics which are equally 
good at matching the observations (though this may not, of course, 
be the case if we were to choose a different set of observations to 
constrain the model). 

Our aim here is to try to constrain cosmological parameters by 
comparing clustering statistics from a simulation populated with 
semi-analytic galaxies to the corresponding measurements in an 
observational survey. We would hope that our constraints are insen- 
sitive to the precise semi-analytic model used, and we would like 
to test whether this is the case. Therefore, although we use only 
one code, GALFORM, we use three different 'models', in the sense 
of different combinations of the physics we attempt to model and 
the parameters governing that physics. In the remainder of this sec- 
tion of the paper, we discuss the technical differences between the 
three models before briefly describing how galaxies are placed in 
the simulations in Section 1231 A reader uninterested in the details 
of the models may therefore wish to skip to !2.3l or to our results in 
Section[3] The three models are as follows: 



• The fiducial model o f ICole et al.l (2000). This is successful in 
matching several sets of observations, including the B- and if-band 
luminosity functions, galaxy colours and mass-to-light ratios for 
galaxies of different morphologies, the cold gas mass in galaxies, 
galaxy disc sizes and the slope and scatter of the /-band Tully- 
Fisher relation. Unfortunately, though, it assumes a cosmic baryon 
fraction, fib, of only 0.02. This is inco nsistent with recent e stimates 
from Big Bang nucleosynthesis (e.g., O'Meara et al. l200ll) and the 
cosmic microwave background dSpergel et al.l2007D . Nevertheless, 
we feel it is worthwhile to include this model in our analysis as 
a well recognized and well understood model that has been thor- 
oughly described and studied. In our figures, lines corresponding 
to output from this model are given the label 'Cole2000'. 

• A model similar to the first, but with fib = 0.04, which is 
closer to current estimates. Since there are twice as many baryons 
as in the first model, if we leave the rest of the parameters un- 
changed then, as expected, the model is unable to match obser- 
vations such as the luminosity function. Therefore we introduce a 
new physical process: thermal conduction in mas sive haloes (this 
is analysed in greater detail bv lBenson et al . 2003). We simply as- 
sume that gas is unable to cool if the halo circular velocity, Vcirc 
satisfies 



Kirc > KondVl + - 



(1) 



where V con d is a parameter we may adjust. This suppresses the 
problematic bright end of the luminosity function; the effect is sim- 
ilar, in fact, to more recent and more physically motivated imple- 
mentations of fee dback from active galactic nuclei in GALFORM 
dBower et af]|2006h . Though it is clearly rather crude, note that our 
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objective here is only to produce a realistic enough galaxy cat- 
alogue to compare to observations. We are trying to mimic the 
effect of whatever physical process suppresses the bright end of 
the luminosity function, without having to adopt a complicated 
parametrization that is no better physically motivated than a more 
simple and understandable one. The label we give to this model 
in our figures is 'C2000hib' (where 'hib' stands for 'high baryon 
fraction'). 

• Another model with fib = 0.04, but now incorporating 'su- 
perwinds'. In this model it is postulated that a galaxy's cold gas 
is heated strongly enough for it to be expelled completely from 
the halo, rather than returning to the reservoir of hot gas associ- 
ated with the halo. In fact, the model is derived from that used by 
iBaugh et al.1 J2005t) to reproduce the abundance of faint galaxies 
detected at submillimetre wavelengths. This also inc orporates the 
additio ns and refinements to GALFORM described bv lBenson et al.l 
d2003l) . These include a modification to the assumed profile of the 
halo gas, a more sophisticated treatment of conduction, and a more 
detailed treatment of galaxy mergers, in particular the eff ects of 
tidal stripping and dynamical friction dBenson et al.l l2002h . They 
also include a simple model of the effect of reionization on small 
haloes, where cooling is prevented if Vcirc < Vcut and z < z cu t for 
two parameters V C ut and z cu t- The strength of superwind feedback 
is parametrized by Vsw, the characteristic velocity of the wind. The 
model is denoted 'Model M' in our figures. 

We wish to run each of these models in many different cos- 
mologies, in order to generate galaxy catalogues in which the N- 
body component and the semi-analytic component are consistent. 
Changing cosmology naturally changes the galaxy population pre- 
dicted by each model, however, so that even if we fix the parameters 
such that the galaxies match observational constraints in one fidu- 
cial cosmology, they are unlikely to match in other cosmologies. 
We therefore tweak the parameters between different cosmologies 
to try to match the data. It is not possible to d o this for the full r ange 
of even the primary constraints described bv lCole etalJd2000l) . We 
restrict ourselves to a compariso n with the 01 r-band SDSS lumi- 
nosity function of Blanton et al. (2003) at z — 0.1. Even then, to 
make the problem tractable and to ensure our three models remain 
distinct, we restrict the parameters we allow ourselves to vary to: 

• Vsw, for Model M only. 

• V'cond, for the C2000hib model only. 

• V cu t, one of the parameters controlling reionization. Though 
we experimented with changing this for all the models, all the 
ones below have either Vcut = (Cole2000 and C2000hib) or 
Kut = 60 (Model M). As well as being simpler, this also helps 
make Model M more distinct from the other two. 

• Vhot and cthot • These are closely linked but we vary them in- 
dependently. They control the strength of standard (i.e. not super- 
wind) supernova feedback in the following way. The rate of change 
of the mass of hot gas and of cool gas in a halo are linked with the 
instantaneous star formation rate, ijj, by: 

Mhot = -Afcooi + (lip (2) 

dCole et all 2000, equation 4.7). ft is related to the circular velocity 
of the galaxy disc, Vdisc, by 

P = (v diBC /v hot y ab ° t 0) 

dCole et alJl2000L equation 4. 15). 

Some of the cosmologies below require quite extreme, per- 
haps unphysical, parameter values. In some cases, GALFORM is 



reluctant to run, while in others the fit for some observations is 
compromised in an attempt to fit the 01 r-band luminosity function 
well. In addition to running each model with tweaked parameters in 
each cosmology, therefore, we also run each model in each cosmol- 
ogy using the same parameters as the central cosmology of Run 1, 
in which (fi M , erg) = (0.3, 0.8). 

We are not able to produce a good fit to the luminosity func- 
tion in a x 2 sense, even allowing these parameters to vary. This may 
be a concern when comparing clustering statistics to observational 
data. Volume-limited galaxy samples, to which we wish to compare 
our results later, are chosen such that all the galaxies are brighter 
than some given absolute magnitude limit. If we choose a sample 
of semi-analytic galaxies with the same magnitude limit, then be- 
cause our luminosity function is wrong we may not be choosing a 
sample that necessarily corresponds to the observational one, even 
within our model. Therefore we instead select semi-analytic galaxy 
samples with a magnitude such that the sample has the same space 
density as the corresponding observational sample. This means that 
when we adjust the parameters of the model to match the lumi- 
nosity function, it is more important for our purposes to match its 
overall shape rather than its magnitude nor malization. 

The T parameter of lCole et al.l a2000h is related to this sort of 
scaling of the luminosity function. It was introduced to account for 
brown dwarfs, which absorb some of the mass of gas assumed to 
be tied up in stars, but without producing light. It is defined by 

Y mass in visible stars + mass in brown dwarfs 

mass in visible stars 
dCole etalj 2000. equation 5.2). Clearly, then, we must have T ^ 1 
given this physical explanation. The result of including this effect 
is to scale luminosities by a factor 1/T. For each GALFORM model 
we run, we compare the resulting 01 r-band luminosity function 
with the observational value from the SDSS (in fact, we compare 
only one point near the characteristic luminosity, L*). We express 
the difference between the two in terms of the T parameter: the 
(reciprocal of the) amount by which we would have to scale the 
luminosity of the semi-analytic galaxies to match the data. Some- 
times this requires T < 1. Therefore, when we give a value of T 
below, it should be treated only as an indication of the amount by 
which we would have to scale luminosities so that when we select 
a galaxy sample by a number density threshold then it would have 
the same luminosity threshold as the observational sample. Note 
that we calculate T by reference to a specific point on the SDSS 
01 r-band luminosity function. Its exact value would change if we 
normalized at a different point (since the model luminosity func- 
tion is not the same shape as the observational one), or in a different 
band (since the colour of model galaxies may be incorrect). 

Once we have given ourselves the freedom to scale the lumi- 
nosity function in this way, then, the effect of varying the parame- 
ters we allow ourselves to change to try to match the shape of the 
luminosity function is as follows: 

• Increasing Vsw, or decreasing Vcond, tends to steepen the 
bright-end slope, i.e. give fewer very bright galaxies. Vsw is only 
non-zero for Model M; Vcond is only finite for the C2000hib model. 

• Increasing Kut reduces the slope at the faint end, reducing the 
number of the faintest of the galaxies we study. 

• Increasing Vhot tends to suppress the overall space density of 
galaxies. Because of the effect of the other parameters, it is most 
useful for adjusting the abundance of galaxies of around L,, or a 
little fainter. 

• Changes in a no t can be viewed as modulating the effect of 
changing Vhot- Visually, for typical values of Vhot, increasing ahot 
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Figure 2. r-band luminosity functions for the thre e fiducial GALFORM 
models, compared to the SDSS luminosity function of Blanto n et al J J2003l) 
(solid line) and a 2dFGRS r-band luminosity function (dotted line, mostly 
obscured by the oth ers) generated using the SuperCosmos r^-band data 
iHamblv et al.l l200lh . The value of the scaling parameter T, required to 
normalize the model luminosities for galaxies of a particular space density 
in this band, is also given in the legend. Errors on the SDSS luminosity 
function are only given for every tenth point, for clarity. 



flattens the faint-end slope, typically over a wider range of lumi- 
nosities than Kut • 

We usually find that to make the bright-end slope steeper and to 
make the faint-end slope shallower, as required by the data, needs 
all parameters tweaked to give larger amounts of feedback. This 
tends to have the overall effect of reducing the predicted luminosi- 
ties, leading to T < 1 as mentioned above. Requiring T 1 
would therefore require us to compromise one component or other 
of the shape in these models. Since we later rescale to match space 
densities anyway, we opt not to make this compromise. Given two 
parameter combinations which both match the luminosity function 
reasonably well and which both give T < 1, we use the T parame- 
ter as a tie-breaker, selecting the combination which gives T closer 
to unity. We have also checked that each model is at least qualita- 
tively consistent with the other primary GALFORM constraints (the 
Tully-Fisher relation, disc sizes, morph ological mix, metallicities 
and gas fractions - see ICole et alj2OO0h . 

The r-band luminosity function for each of our models in 
our fiducial cosmology (SJm = 0.3 and as = 0.8) is shown in 
Fig- HI Qualitatively, the agreement between the models and the 
data is reasonable. The very sharp cutoff at the bright end of the 
luminosity function in Model M is a generic feature of the model. 
The lower space density of very faint galaxies in this model is also 
generic, and comes from the introduction of reionization (non-zero 
Vcut )- At first glance, it appears that the Cole2000 model gives bet- 
ter agreement with the data at the bright end than the C2000hib 
model, despite the inclusion of a feedback mechanism specifically 
to solve this problem in the latter. Recall, though, that the Cole2000 
model has a lower baryon fraction, and despite this we have to in- 
troduce relatively high levels of supernova feedback to match the 



shape of the luminosity function. We therefore need T ~ 0.5 to re- 
cover the correct luminosities, while the C2000hib model needs a 
much more physically palatable T ~ 1.2. It may appear that our re- 
quirem ent for T ~ 0.5 is inconsistent with the original lCole et al] 
(2000) paper, the reference model of which requires T = 1.38. 
It is not inconsistent, for a few reasons. Firstly, we use the label 
'Cole2000' for our model because it uses equivalent code with the 
same physics go verned by the same parameters as the models of 
ICole et al.l d2000). As we have just noted, however, some of the pa- 
rameters take different values in our fiducial model in order to try to 
match the shape of the r-band lum i nosit y function. Secondly, while 
the reference model o f Cole et al. (2000) had as = 0.93, ours has 
as = 0.8. Thirdly, their T was calculated by reference to the value 
of the observed bj band luminosity function at L* (though in fact 
the same correction also provided a good match to the A"-band lu- 
minosity function). Ours is calculated by reference to the r-band 
luminosity function. We match to a point slightly brighter than L* 
(where the exponential cutoff has started to bite more deeply and 
the galaxies are less abundant; the point at M r ~ —21.5 can be 
seen quite easily in Fig. [2] as being where all the lines cross) since 
we otherwise had problems calculating T for some of our models 
with a very shallow faint-end slope and low galaxy number density. 



2.3 Galaxy placement 

With the TV-body simulations and the semi-analytic catalogues in 
place, it remains to merge the two to create a synthetic galaxy cata- 
logue, or in other words to populate the simulations with our galax- 
ies. To each halo in the simulation we assign a semi-analytic galaxy 
population for a random merger tree of the same mass. We then 
place the central galaxy at the position of the particle with least 
gravitational potential energy, and place the satellite galaxies on 
random particles within the halo. 

One might worry that given the resolution of our simulations, 
it is possible for the semi-analytic model to predict that a halo in 
a simulation contains a bright enough galaxy to enter our sample 
even though the halo is not resolved with at least 20 particles, which 
is our normal criterion for considering the halo to be resolved. To 
take account of these galaxies, we calculate the nu mber of such 
haloes expected for the simulation volume for the Jenki ns et al] 
J200lh mass function. We then take the galaxy populations pre- 
dicted by GALFORM for these haloes and place the galaxies on ran- 
dom particles in the simulation which are not in haloes. We do not 
expect this to have a significant effect on clustering statistics, since 
almost all galaxies which would be placed in unresolved haloes are 
very faint, and in any case the hal o bias as a function of mass is 
not st rongly varying in this regime (Cole & Kaiser 1989; M o et al] 
1999) so that we do not lose too much accuracy by placing galaxies 
in haloes of the wrong mass. None the less, we have checked that 
employing this scheme has only a small effect on our measured cor- 
relation functions. Changing the minimum resolved mass from 20 
particles to 50 particles has only a very small effect on the corre- 
lation function, as does ignoring the 'unresolved' galaxies entirely, 
even for a conservative mass limit of 50 particles. Moreover, this 
remains true even for galaxy samples which are rather faint when 
compared to the magnitude limit of the SDSS samples we will be 
considering, and which therefore provide a more stringent test. 
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Figure 3. Clustering in our models and in the SDSS. r p w p (r p ) is plotted for clarity. The solid, black line with error bars is the SDSS data; the dotted line 
shows the SDSS flux-limited sample for comparison. The coloured lines are from our three models: short-dashed red for Cole2000, long-dashed green for 
C2000hib and dot-dashed blue for Model M, as in Fig. [2] The nine different cosmologies form a grid in erg with the cosmological parameters lying on the 
curve crgf!^ 5 = 0.8(0.3) 0,5 . This plot shows models for which we allow ourselves to tweak the GALFORM parameters to match the luminosity function. 



3 RESULTS 

3.1 Observational samples 

In their study of the luminosity and colour dependence of the 
galaxy correlation functio n using the main galaxy sample of the 
SDSS. IZehavi et alj (2005) calculated the projected two-point cor- 
relation function w p (r p ) for ten different galaxy samples defined 
by thresholds in absolute magnitude. We have been provided with 
these correlation functions, a nd their covariance matrices calcu- 
lated by jackknife resampling. IZehavi et al. also tabulate the 
space density of each sample, so it is straightforward for us to select 
corresponding samples of semi-analytic galaxies. 

Our cosmological constraints will use samples with a galaxy 



space density n 9 = 0.00308 h z Mpc -3 , corresponding to galax- 
ies with M r — 5 log 10 h < —20.5 in the SDSS. Our semi-analytic 
catalogues have approximately twice the effective volume of the 
observational sample, so when calculating how well our models 
fit the data we use only the covariance matrix of the observational 
correlation function to compute our errors, neglecting the statistical 
errors on the simulated function. We use the sample of this space 
density since it provides a good compromise between volume and 
space density, giving relatively small errors, and since most of the 
constraining power then comes from the galaxies of intermediate 
luminosity which are modelled best by the semi-analytic code. We 
will, though, briefly discuss the effect of using samples of a differ- 
ent space density or selected in a different waveband. 
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Figure 4. A comparison between our four sets of models for one particular 
value of erg . The top two panels are for models in which we tweak the semi- 
analytic parameters to match the r-band luminosity function, while for the 
bottom two the parameters stay the same as for our fiducial model. The two 
left-hand panels are for the 1ow-S1m cluster normalization curve while the 
others are for the high-f^M case. The colour coding and line styles of the 
models are the same as for Figs.|2]and[3] 



Table 2. The key to the model numbering used in Fig. [3] The first column 
gives the label we assign to each of our 12 sets of populated simulations 
(each of which has nine cosmologies, regularly spaced in erg). The 'Grid' 
in the second column refers to whether the cosmologies lie on the cluster 
normalized curve with high Qyi (Grid 1) or low SJm (Grid 2). The third 
column shows whether we adopt different parameters in different cosmolo- 
gies, or whether they stay the same, while the fourth gives the GALFORM 
model in use. 



Model no. 


Grid 


Same/diff. pars. 


GALFORM model 


1 


1 


diff. 


C2000hib 


2 


1 


diff. 


Cole2000 


3 


1 


diff. 


M 


4 


1 


same 


C20()0hib 


5 


1 


same 


Cole2000 


6 


1 


same 


M 


7 


2 


diff. 


C20()0hib 


8 


2 


diff. 


Cole2000 


9 


2 


diff. 


M 


10 


2 


same 


C2000hib 


11 


2 


same 


Cole2000 


12 


2 


same 


M 



3.2 Constraints 

We compare the clustering in our synthetic catalogues and in the 
SDSS in Fig. [3] We plot the quantity r p w p (r p ) since this scales out 
much of the r p dependence and makes differences in shape easier to 
see. Fig.[3]shows our results for a grid of nine cosmologies spaced 
regularly in erg such that they lie on the curve crgf^ 5 = 0.8(0.3) ' 5 
('Grid 1'). For this plot we show the models for which we allow the 
semi-analytic parameters to vary so as to provide a good match for 
the r-band luminosity function. Note that we have three other sim- 
ilar sets of models: one which includes the same cosmologies but 
in which the semi-analytic parameters are identical in each cosmol- 
ogy, and two more in which the cosmologies lie on the same grid 
in erg but which have lower £Im, such that ergf^ 5 = 0.9(0.3) ' 5 
('Grid 2'). In one Iow-JIm sequence the semi-analytic parameters 
are allowed to vary, and in the other they are not. A figure similar to 
Fig.[3]could be made for each of the latter three sets of catalogues, 
but since the features turn out to be qualitatively similar we do not 
show such plots here. We do, though, compare the four sets for one 
particular value of erg in Fig. [4] For each grid of cosmologies and 
for each choice as to whether to allow the parameters to vary we 
have catalogues for each of our three models, and therefore in total 
we have twelve sets of catalogues each of which has nine members 
lying on a regular grid in erg. The key to our numbering of these 
sets is given in Table|2] 

Examining Fig.[3] it is clear that some of our catalogues fit the 
data better than others. For the higher erg cosmologies, the shape 
of the models fits that of the data rather well. The trend between 
cosmologies is consistent between the three GALFORM models: a 
higher amplitude of clustering for higher erg, as expected. There 
are differences between the models, however, especially on small 



scales, which must arise from differences in the details of the halo 
occupation distribution predicted by the models. 

The variation in the predicted correlation function between 
cosmologies, and the consistent trend between models, supports 
our hope that comparison to the SDSS correlation function can con- 
strain erg. For each set of nine cosmologies, we calculate x 2 with 
respect to the observed correlation function and its covariance ma- 
trix, then fit a quadratic through the three points around the mini- 
mum to interpolate and estimate the best-fitting erg and its 1-cr error. 
The result of applying this procedure is given in Fig. [5] There, we 
give an estimate of erg and its errors for each of our twelve different 
sets of populated simulations, as the black crosses and error bars. 
The model number referred to on the z-axis is explained in Table|2] 

A few comments may be made about Fig. [5] Firstly, some of 
the sets of models yielded no value of erg for which the simulated 
correlation function was an acceptable fit to the observed one. The 
large \ 2 an d ^-X 2 values then result in a spuriously small error 
bar. This is the case for models 9, 11 and 12, so the constraints 
coming from those models should be ignored. Secondly, recall that 
we ran only two iV-body simulations, using only the outputs given 
in Table Q] In fact, while one was run with a larger value for erg 
(and therefore a smaller £Im for an output with given erg), it was 
started with initial conditions where the different Fourier modes 
of the density field were given the same phase as in the low-erg 
simulation. This means that the constraints from the different sets 
of synthetic galaxy catalogues are not independent, but should be 
used to give an indication of the systematic error arising from the 
choice of semi-analytic model parameters and the assumed value 
of S7m (which we do not constrain). Note that we have also plotted 
two constraints for each model in Fig. [5] On the left, with solid 
error bars, are the constraints derived just as we have described. 
We discuss the estimates on the right, with dashed error bars, in 
Section l33H 
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Figure 5. Constraints on <rg . The rr-axis shows the model number, the key 
to which is given in Table [2] The y-axis shows the lcr constraint on ag 
achieved in that particular model. The points with solid error bars are for 
the unmodified catalogues. The points with dashed error bars show how 
the constraints change when w e make an empirica l correction for the effect 
of halo assembly bias jCroton. Gao & Whitdl2007l and references therein). 
We do this by modifying the correlation function according to the scale- 
dependent bias between shuffled and unshuffled GALFORM catalogues in 
the Millennium Simulation, as described in Section [3.3.1l 



dates the straigh tforward application of halo models of galaxy clus- 
tering teenson et alj200ol ; lseliakl20o"ol ; |Berlind & Weinberg) 20021 : 
ICoorav &S heth 2002). It has stimulated theoretical attempts to ex- 
plain the departure from the extended Press-Schech ter prediction 
(e.g., ISandvik et all 120071 : Iwang, Mo & Jingl 120071) . and to look 
for possible effects on the galaxy population both obs ervationally 
( Yang, Mo & van den Boschl 12006 ) and in models dCroton et al.l 
120071 : Izhuetal .1120061) . 

Our correlation functions are corrected using a GALFORM cat- 
alogue generated in the Millennium Simulation. The version of 
GALFORM which generates the catalo gue takes as its in put the ac- 
tual iV-body merger tree of each halo ( Bower et al.l2006l) . This cat- 
alogue therefore incorporates environmentally de pendent halo for- 
mation. In a similar spirit to ICroton et alj |2007), we shuffle this 
catalogue, assigning to each halo the galaxy population of a ran- 
dom halo of the same mass. This destroys any connection between 
the environment of a halo of given mass and its galaxy popula- 
tion. We calculate the galaxy correlation function for a range of 
different magnitude thresholds for both the original and shuffled 
catalogues. This gives us an estimate of the effect of halo assembly 
bias, for GALFORM galaxies at least: we note that while our r esults 
are qualitatively consistent with those of ICroton et alj ( 120071) . the 
two semi-analytic models do not respond identically. We have cal- 
culated the scale-dependent ratio between the correlation functions, 
and then used this ratio (for a sample of appropriate space density) 
to correct the correlation functions used for our constraints. This 
is intended only to give an estimate of the size of the systematic 
error on our constraints coming from halo assembly bias. As Fig. [5] 
shows, the error from this source is small in comparison to the sta- 
tistical errors, for our model at least. 



3.3 Other catalogues 

3.3.1 Halo assembly bias 

The points with dashed error bars in Fig. [5] show a constraint af- 
ter we attempt to correct our simulated correlation functions for 
the effect of so-called 'halo assembly bias'. This is an effect that 
may arise if one of the assumptions we make when populating our 
simulation with galaxies is incorrect. We assume that the distri- 
bution from which the properties of the galaxy content of a halo 
are drawn depends only on the mass of the halo. In other words, 
since the halo merger tree is the basic input to our semi-analytic 
model, we assume that the distribution from which the proper- 
ties of a halo's merger tree are drawn depends only on halo mass. 
This is explicitly the case for the Monte Carlo merger trees we use 
here, si nce it is a result of the underlying extended Press-Schechter 
theory jpress & Schechterl 1 19741 : lBowerlll99ll : liond et aljfl99ll : 
lLacev & Colelll993|). This result was also supported by work on 
simulations " dLemson & Kauffmannl [l999t) . More recently, how- 
ever, the advent of larger simulatio ns with better resolution has al - 
lowed this result to be challenged. iGao. Springel & White! l20ol) 
showed that old haloes of a given mass in the Millennium Sim- 
ulation were mor e strongly clustered than young haloes of the 
same mass, while lHarkeretal.1 d2006h demonstrated that halo for- 
mation time is a function of halo environment as well as halo 
mass using an independent set of merger trees in the same sim- 
ulation. Because halo age is a property of the halo merger tree, 
this shows that the assumption we have described is violated. In 
fact, the variation of TV-body merger tre es with environment ha s 
been studied in other large simulations (Maulbetsch et alj|2007r) . 
More generally, this environmental dependence formally invali- 



3.3.2 Luminosity dependence 

We calculate the correlation length of the samples by fitting a power 
law to f(r) for 2 < r/(h~ l Mpc) < 20; that is, we parametrize 
the correlation function as £(r) = (r/ro)~ 7 where ro is the corre- 
lation length. We have done this for all our samples of all luminosi- 
ties, so we are able to plot the correlation length as a function of 
sample space density (or, equivalently, as a function of sample lu- 
minosity threshold) in Fig. [6] The black line in the plots shows the 
corresponding result from the SDSS. The SDSS data show a steady 
increase in clustering strength with luminosity (i.e. with decreasing 
space density) apart from a feature at n g ~ 0.006 h 3 Mpc -3 cor- 
responding to the difference between two A/, max = —20.0 sam- 
ples: one has a la rge, overdense region at z ~ 0.08 excised (see 
IZehavi et alj|2005l for details) and has lower space density but, as 
might be expected, weaker clustering than the sample where this 
region is retained. 

For many cosmologies, the Cole2000 model and the updated, 
higher baryon fraction C2000hib model do a reasonable job of 
matching the luminosity-dependent clustering in the SDSS, espe- 
cially for samples of moderate to low space density. Model M, 
which invokes superwinds (see Section [2~2l . does not do so well, 
predicting very little luminosity dependence. This may be because 
the feedback effects are so extreme in large haloes that their cen- 
tral galaxies are little brighter (if at all) than those at the centre of 
less massive, less biased haloes. The other two models tend to have 
the opposite problem in the brightest samples: they tend to predict 
too high an amplitude of clustering. This could be due to too tight 
a relationship between halo mass and galaxy luminosity (perhaps 
because in reality feedback is more efficient or more stochastic): 
none of the brightest galaxies is scattered into lower mass, less bi- 



Constraints on cr 8 from simulations 9 




Figure 6. Correlation length as a function of sample space density for r-band selected galaxies, for the same set of models and with the same colour coding 
and line styles as Fig. [3] 



ased haloes. A generic feature of the GALFORM models seems to 
be an upturn in the clustering amplitude at high space density. This 
suggests that too many of the faint galaxies generated by the model 
reside in high mass haloes. This may be related to the fact that it is 
hard to produce a luminosity function with a flat enough faint-end 
slope, the excess of faint galaxies perhaps consisting of satellites in 
massive haloes. 

Clearly, matching the luminosity-dependent clustering of 
galaxies will continue to be a very stringent test for semi-analytic 
models. Even if the models were provided with the correct cos- 
mology as an input, matching the clustering would still seem to 
require that the models predict the correct galaxy population for 
haloes as a function of luminosity and mass, rather than predicting 
quantities which implicitly average over a range of halo mass, such 
as the (unconditional) galaxy luminosity function. Conversely, if 



the models were able to correctly capture the trends of luminosity 
dependent clustering, it would give us more confidence that they 
were predicting realistic galaxy populations on a halo-by-halo ba- 
sis, and give a firmer foundation for attempts to constrain cosmol- 
ogy with methods involving semi-analytic catalogues. Though we 
bear this in mind, it seems unrealistic to require a perfect and com- 
plete model of galaxy formation before considering the information 
it can provide us on cosmological parameters. 

The models display a minimum in clustering strength at 
roughly the space density of the sample we use for our constraint. 
We might therefore expect that using a sample of different space 
density may yield a lower estimate for as than the sample we 
have used above. In fact when we attempt to constrain erg using 
different samples, we obtain high values of \ 2 f° r a U cosmolo- 
gies, even for those which appear by eye to be acceptable fits, 
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Figure 7. A plot similar to Fig. [3] but for a 2dF sample with M&j — Slog^/i < —18 and a 6j-selected GALFORM sample with space density 
0.00249 h 3 Mpc -3 . 



or which give reasonable values of x 2 using only the diagonal 
elements of the covariance matrix. We therefore suspect that for 
these samples, an estimate of as would be severely affected by 
noise in the covariance matrix. Using only the diagonal elements 
for samples with n g = 0.00031 or 0.01015 h 3 Mpc" 3 (having 
M r — 5 log 10 h < —21.5 and —19.5 respectively) would suggest a 
slightly lower erg, in the region 0.85-0.9. 

One might worry that the su percluster a t z ~ 0.08 (mentioned 
above) affects our constraints. IZehavi et al] d2005h note, however, 
that in their analysis it has no effect on samples fainter than their 
M T — 5 log 10 h < —20.0 sample, while removing it has a very 
small effect on brighter samples, producing a negligible drop in 
w p (r p ). If the drop were larger, then using samples without this re- 
gion removed could bias our estimate of as upwards. More likely, 
the supercluster may cause a slight underestimate of the size of our 



error bars, since the jackknife samples used to calculate the co- 
variance matric are smaller than the supercluster. This prevents the 
jackknife method from fully capturing the variance in the density 
field. 



3.3.3 Constraints from the 2dFGRS 

We have chosen to use the (r-band selected) SDSS rather than the 
(6,/-band selected) 2dFGRS for our main constraint on as, since 
the prediction for the luminosity of galaxies in bluer bands depends 
more heavily on recent star formation. It therefore tends to be more 
model-dependent than the prediction for redder bands, where there 
is a larger dependence on total stellar mass. None the less, the 2dF- 
GRS provides very valuable data on galaxy clustering, and an accu- 
rate galaxy formation model should give constraints on as which 
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are consistent between the two datase ts. In addition, the analysis 
of satellite fractions in the 2dFGRS bv lvan den Bosch et al.[ ( 2005) 
suggested, if somewhat indirectly, that the 2dF data prefer a rela- 
tively low <78 • In a similar spirit to the analysis we perform here, 
this constraint on erg came about independently of other datasets. 
It may therefore be interesting to see whether our relatively high 
value of erg coming from galaxy clustering data alone is driven by 
the data (in which case our estimate for as when using 2dFGRS 
data should b e consistent with theirs ) or by other factors. Note, for 
example, thatjPan & SzapudJ d2005h quote a high preferred value 
for erg in their 2dFGRS clustering analysis - albeit concentrating 
on the three-point function - though their error bar extends to low 
values, cr 8 = 0.93±%f. 

The 2dF GRS clustering data we use are an updated version of 
the analysis o flNorberg etall J200 ll [2002h (Norberg et al., in prep.). 



0.9 



This is the same dataset as used by iTinker et all d2007h in their 
study of the luminosity dependence of the galaxy pairwise velocity 
dispersion. We compare the sample with Alt, — 5 log 10 h < — 18 
to corresponding catalogues from our models in Fig. [7] The grid of 
models used is the same as for Fig. [3] but we select samples us- 
ing a bj magnitude threshold so as to match the space density of 
the 2dF sample. The threshold is chosen so that the space density, 
0.00249 h 3 Mpc" 3 , is similar to that of our main SDSS sample. 
The model clustering appears to depend more weakly on cosmol- 
ogy than for the r-selected samples, but there is still a clear trend 
and so we would hope still to be able to use these data to estimate 

(Tg. 

We calculate \ 2 between the data and the model using a 
principal component analysis, again ignoring the errors on the 
model correlation functions as we did for the SDSS. This analy- 
sis is performed on the dimensionles s projected correlati on func- 
tion, u> p (r p )/r p , denoted H(er) /a bv lNorberg et alj j2002l) . We use 
only the first six principal components, which account for over 99 
per cent of the variance. Statistical errors in the estimate of the 
principal components dominate the contribution to \ 2 of the less 
significant components. This illustrates the problems which would 
arise if we instead used the whole covariance matrix, as highlighted 
in Section [3".3.2l 

The resulting constraints on erg are given in Fig. [8] The model 
numbering is the same as for Fig. [5] and is given in Table|2] Noting 
the change in axis scale from Fig. [5] we see that the statistical error 
on erg from the 2dF sample is comparable to that from the SDSS. 
While most of the grids of models yield erg estimates similar to 
those obtained from the SDSS (if perhaps a little lower), there are 
several model grids which give significantly lower values of erg for 
the 6j-selected samples than they did for the r-selected ones. In 
fact, these grids (numbers 1, 4 and 10) all use the C2000hib model. 
Our results therefore suggest that in this model the blue galaxies 
are more clustered than in the others, and hence lower dark matter 
clustering is required to match the observational result. This may be 
because the feedback excessively reddens isolated galaxies, leav- 
ing a larger proportion of the bluer galaxies in more massive, more 
clustered haloes. In any case it supports the idea that the cluster- 
ing of model galaxies selected in bluer wavebands may be more 
dependent on the semi-analytic prescription. 



4 DISCUSSION 

As we note in Section [3~2l the constraints from the 12 different sets 
of catalogues in Fig.|5]are not independent, since the underlying N- 
body simulations in each case were seeded with the same phases. 
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Model no. 

Figure 8. Constraints on erg for the 2dF sample. The model numbering is 
the same as for Fig.f5]and is given in Tablefl] 



This does, though, mean that we can use the scatter between the 
catalogues to estimate the systematic error in the constraint aris- 
ing from our choice of semi-analytic prescription. While we only 
have three different models (along with variants in which we do not 
tweak the parameters to match the r-band luminosity function), we 
can see that they differ quite strongly in the luminosity dependence 
(Fig. O and colour dependence (Fig. [8j of their clustering. They 
may, then, be representative of the scatter we can expect between 
GALFORM models that fit the r-band luminosity function and the 
primary constraints listed in Section l22l From the range of ~ 0.07 
in the value of the best-fitting erg between sets of catalogues, we 
estimate a systematic error from this source of ±0.04. The aver- 
age size of the statistical error bars among the catalogues for which 
the best-fitting erg was a good fit in a \ 2 sense suggests a statisti- 
cal error of, again ±0.04. Adding these in errors in quadrature to 
the mean of the best-fitting values in these catalogues gives a final 
figure of erg = 0.97 ± 0.06 for the r-band samples. 

Within the scope of the parameter variations we investigated, 
if we assume that erg = 0.8 then none of our models gives us a good 
fit to the SDSS clustering over the full range of scales. This does not 
preclude the possibility that models that do not fit the luminosity 
function, or that include different physics to our particular semi- 
analytic model, may achieve such a fit. 

Our value for erg is clearly at odds with the most strik- 
ing recent measu rement, from the thre e-year WMAP data. Using 
those data alone, ISpergel et all J2007t) quote erg = O^ll^g 
for flat, power-law ACDM, and this value is not significantly in- 
creased (though the error bars tighten) when the data are anal- 
ysed jointly with galaxy clustering or supernova data. There is, 
though, some tension between the WMAP result and results from 
weak lensing surve ys, which provide rather comple mentary pa- 
rameter constraints dTereno et al.ll2005r) . ISpergel et aljs joint anal- 
ysis of WMAP and the CFH TLS lensing survey ( IHoekstra et al.l 
l200d ISemboloni et alj l2006h pulls their estimate up to erg = 
0.827±°- 

mt . with the lensi ng data alone favouring even higher val- 
ues. [Benjaminler^ |20o3) have combined data from the CFHTLS 
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and other surveys to give cr 8 (f2 M /0.24) ' 59 = 0.84 ± 0.07. 
Lyman-q forest data can be used to cons train the power spectrum; 
ISeljak. Makarov. McDonald et alj d2005h quote erg = 0.90 ± 0.03 
(reducing to 0.84 incorporating the new constraints on reionization 
from the three-year WMAP data). Measurements of cluster abun- 
dance have frequently been used to constrain erg, but provide a very 
wide range of estimates because of the difficuly i n relating the prop - 
erties of an observed cluster to its mass (e.g.. iRasia et alj 2005). 
Recent estimates are, though, consisten t with the WMAP determi- 
nation of as (e.g JPierpaoli et alj2003l) . 

The overall picture of the value of erg from other methods is 
therefore a little confusing, but even the highest recent estimates are 
only marginally consistent with ours. A possible source of tension 
between our constraints and those from WMAP is that we have 
assumed a spectral index n B = 1, while the best-fitting WMAP erg 
is quoted for their best-fitting n s of approximately 0.95. As one 
can se e from the lower right panel of figure 10 of Sper gel et al.l 
d2007h . the constraints on these two parameters are correlated. Even 
so, increasing n s to unity would only correspond to an increase in 
o"8 of 0.1, which is not enough to eliminate the discrepancy with 
our result. A further complication is that the size of n s is not the 
only difference in the initial P(k) between the WMAP three-year 
constraints and our model. Though ideally one might l ike to repeat 
our te sts using the power spectrum shape inferred by Sperg eTet al.l 
(2007), we note that despite some parameter changes from the first- 
year WMAP constraints, the net change in the power spectrum is 
small. Furthermore, other datasets that are less dependent on n s 
also prefer a lower erg than we do, so even were we to infer a higher 
erg from the CMB than currently favoured, our problems would not 
be solved. 

As far as using galaxy data alone goes, methods involving 
higher-order correlations, particularly the three-point correlation 
function (or its Fourier counterpart the bispectrum) are promising, 
for example because of their ability to constrain galaxy bias. The 
addition of dynamical information, for example redshift space dis- 
tortions or the pairwise velocity dispersion (PVD) can also help 
constrain cosmological parameters and galaxy bias. An analysis 
including PVD infor mation in the condition al luminosity function 
(CLF) approach, by I Yang etal.|j2004l.l2005h . suggested relatively 
low values of erg, though this was inferred from their models with 
high cr g since low-erg simulations were not explicitly analysed. Re- 
sults from halo occupation (HOD) modelling, an approach perhaps 
more akin to ou rs - for exa mple the analysis of the cluster mass- 
to-light ratio_byJTinker et al. (2005) - have tended also to favour a 
low crg. lZheng & Weinberg! j2007l) . and references therein, provide 
a detailed explanation of HOD modelling and its use in constrain- 
ing parameters using galaxy data alone. 

An alternative approa ch using the 2dFGRS is employed by 
Ivan den Bosch et aT] d2005h . They study the abundance and radial 
distribution of satellite galaxies within the CLF framework, us- 
ing mock galaxy catalogues produced by a semi-analytic code to 
calibrate their model. This calibration quantifies the impact of the 
inevitable imperfections in the halo finder that lead to satellite 
galaxies being spuriously identified as central galaxies of separate 
haloes, and vice versa. It also accounts for incompleteness effects 
in the 2dFGRS. Their results are consistent with other CLF analy- 
ses in suggesting that simultaneously matching the observed cluster 
mass-to-light ratio and the fraction of satellite galaxies in the 2dF- 
GRS requires a low value of erg, lowering the abuundance of very 
massive haloes with a great number of satellites. Again, they do not 
directly construct mock galaxy redshift surveys for a low erg model, 
but using the same calibration parameters as for their erg = 0.9 



model leads them to believe that adopting erg = 0.7 provides a 
better fit to the data. 

The parameters of the conditional luminosity functions which 
van den Bosch et a i] j2005l) use to fit the 2dFGRS data for low and 
high erg are tabulated in their paper. We have used these parameters 
to construct the corresponding mean occupation functions - that is, 
the mean number of galaxies in a halo of given mass - then used 
these functions to populate the Millennium Simulation, and outputs 
of our Run 1 having erg = 0.7 and erg = 0.9. This is done as fol- 
lows: for each halo in the simulation, we look up the mean number 
of galaxies in a halo of this mass, (N(M)). If (N) ^ 1 the halo 
receives either a single, central galaxy (with probability (TV)) or no 
galaxies at all (with probability 1 — (N)). If (TV) > 1 then the halo 
receives a central galaxy, plus a number of satellite galaxies drawn 
from a Poisson distribution with mean (N) — 1. The use of these 
proba bility distribution s follows the work of lKravtsov et al.l J2004I) 
and lZheng et alj d20050 : in addition, the halo occupation distribu- 
tion of GALFORM galaxies in our models is consistent with this 
scheme. Once we know the number of galaxies in a given halo, we 
then place the galaxies according to the scheme described in Sec- 
tion!^ 

We find that the mean occupation functions look reasonable, 
though in some cases they are not quite monotonic, and do not gen- 
erally exhibit so clean a 'step function + power law' form as the 
GALFORM mean occupation functions. In our approach to populat- 
ing simulations with the CLF HODs, we do not assign a luminosity 
to each galaxy, and so we construct a different catalogue for each 
magnitude threshold we wish to analyse. The space density of these 
catalogues as a function of magnitude gives us a cumulative lumi- 
nosity function, and we have checked that this matches the 2dF- 
GRS fa ./-band luminosity function for the erg = 0.9 catalogue, as it 
should by construction. For erg =0.7, the CLF catalogues give too 
steep a faint-end slope of the luminosity function, so that strictly 
speaking the CLF is incorrect, though this should not be a concern 
for galaxies of the luminosity we use for our clustering analysis. 
We also find that the CLF produces clustering consistent with our 
fcij-selected GALFORM samples. 

The similar clustering in the GALFORM and CLF catalogues 
raises the question of what drives the difference in preferred erg. 
The similarity between values from our SDSS and 2dF analyses 
suggests it is not driven entire ly by the data. Moreover, it seems a t 
odds with the conclusions of Tinker, Weinberg & War ren (2006), 
who show that in their HOD model the projected correlation func- 
tion tightly constrains the satellite fraction. The answer may lie in 
the fact that their parametrized HOD, and our semi-analytic HOD, 
are unable to match the form of the mean occupation functions pro- 
duced by the CLF approach, in which the parametrizations adopted 
for different parts of the CLF are a few steps removed from HOD 
parameters. 

It would be exciting to conclude that our results support a real 
difference between low-redshift estimates of erg (e.g. from weak 
lensing) and estimates using CMB data, and that this indicates 
something about, say, evolving dark energy. Other analyses find 
lower values, though, and there are still one or two concerns about 
our constraints. The mean occupation functions from our high S1m, 
low erg GALFORM runs tend to be more ragged, perhaps indicat- 
ing a difficulty with our modelling in these cosmologies. While we 
have checked that galaxy samples selected in a different waveband 
give similar results, the anomalous C2000hib model gives some 
cause for concern. Similarly, while samples with a different magni- 
tude threshold appear to give consistent results, the luminosity de- 
pendence of clustering differs between models. These samples also 
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have smaller effective volume which seems to give rise to problems 
in the error analysis. 

As we hoped, a large part of our constraint comes from the 
intermediate-scale clustering for which halo-based models are most 
necessary, but this is affected by the scheme for placing galaxies 
within haloes. The largest difference between our high and low erg 
models (and between our low erg models and the data) is manifested 
at small scales. The radial distribution of satellite gal axies within 
haloes is again different in the GALFORM model of iBower et al.l 
d2006h which uses TV-body substructure data, and this is clearly an 
area of galaxy formation modelling which requires further atten- 
tion. We have repeated our analysis of the SDSS data using only 
points with r p > 2 h~ x Mpc. While the best-fitting value of as 
decreases by approximately six per cent to ~ 0.91, the error bars 
approximately double in size, illustrating the importance of making 
use of smaller scales. 

While bearing the above caveats in mind, we would like to em- 
phasize the tight constraints available in principle using our tech- 
nique, and to note that the results presented here are consistent (if 
marginally) with some other analyses which use only low-redshift 
data. If the tightness of the constraints seems surprising, consider 
firstly that in the absence of uncertainty about galaxy bias, the con- 
straints would be extremely tight as the amplitude of the corre- 
lation function is very well determined. Secondly, by comparing 
model galaxies to real galaxies of the same abundance, we fac- 
tor out most of the dependence on the details of the semi-analytic 
model. Any change that makes galaxies monotonically brighter or 
fainter will have no effect on our comparison between models and 
data. Thirdly, the number of galaxies assigned to each halo in the 
semi-analytic model is principally determined by the merger his- 
tory of that halo, and this is well described by the extended Press- 
Schechter theory. Hence this is not a major uncertainty in the semi- 
analytic models: unlike purely statistical descriptions as provided 
by the HOD or CLF, our semi-analytic models do not have the free- 
dom to define arbitrary HODs. These are instead largely determined 
by the merger trees. 

We also remark that other constraints using galaxy data alone, 
which at first sight seem inconsistent with ours, use different tech- 
niques or different data or both. These inconsistencies arise even 
though galaxy formation models can reproduce statistics which im- 
plicitly average over a range of halo mass - such as the uncondi- 
tional galaxy luminosity function - reasonably well. This illustrates 
that matching the luminosity- and colour-dependence of clustering, 
at small and large scales, is an important and very stringent require- 
ment on galaxy formation models. 



each of these variants we generate a catalogue in which the GAL- 
FORM parameters are adjusted to match the SDSS 01 r-band lumi- 
nosity function and a catalogue in which the parameters take the 
same values as they take in the (Qyi, as) = (0.3, 0.8) cosmology. 
We obtain the result as = 0.97 ± 0.04(statistical) ±0.04 (system- 
atic). This constraint is impressively tight, given we have attempted 
to narrow the range of assumptions we require to produce an esti- 
mate of erg by using only one well understood, low redshift dataset. 
By choosing grids of cosmologies which lie on cluster-normalized 
curves, ergf^ 5 = const, we have shown that the degeneracies in- 
herent in our approach are different to those inherent in cosmic 
shear measurements, which provide an important low redshift con- 
straint in Q,m and as - In fact our method gives an almost pure con- 
straint on as- We have shown that in our model, halo assembly bias 
does not severely affect our constraint on erg, though this may not 
be universally the case for other semi-analytic codes. If it were not 
the case, we would expect it to bias our estimate of as high, since 
failing to account for halo assembly bias tends to lower the ampli- 
tude of model correlation functions, requiring an increased as in 
the model to compensate. 

We obtain similar values for erg if we use samples with a lower 
or higher galaxy space density, but the error analysis is less secure. 
We also obtain similar values using a principal component analysis 
of 2dFGRS data, for a sample of similar space density to that used 
for our primary constraint. We note, though, that the clustering of 
galaxies selected in bluer wavebands appears to be more model- 
dependent, as one might expect. 

Our estimate of erg looks high compared to the values obtained 
by many recent measurements, in particular those from the WMAP 
experiment. While we note that this tension has some interesting 
consequences if it persists, we have also pointed out how appar- 
ent inconsistencies between our results and other low redshift con- 
straints may arise. Small and intermediate scales in the correlation 
functions contribute strongly to \ 2 an d hence to our constraint on 
as, and yet are not as well understood as the large scales. This is 
clearly an area where further modelling effort is required. More- 
over, we will not be completely assured that semi-analytic models 
capture the phenomenology of the galaxy population sufficiently 
well for high precision cosmological constraints until they are able 
to match the observed colour- and luminosity-dependent clustering 
of galaxies. The models need to be able to reproduce the proper- 
ties of the observed galaxy population on a halo-by-halo basis, not 
just the properties averaged spatially or over luminosity. This im- 
plies that if cosmological parameters can be tightly constrained by 
other techniques, measurements of galaxy clustering will continue 
to provide stringent tests for models of galaxy formation. 



5 CONCLUSIONS 

We have compared the SDSS projected two-point correlation func- 
tion at a galaxy space density n g — 0.00308 h 3 Mpc -3 to a suite 
of populated simulations generated using the A-body code GAD- 
GET2 and the semi-analytic code GALFORM. Because we require 
A-body data in a great number of different cosmologies, we have 
relabelled and rescaled som e simulation outputs using the tech- 
niques of IZheng et ail 120021) to avoid the need to run a full sim- 
ulation for each cosmology in our grid. The galaxy catalogues are 
self-consistent, GALFORM being run afresh for each cosmology we 
study. 

We have attempted to estimate the systematic error in our 
value of erg due to the particular choice of semi-analytic model by 
running three different GALFORM variants in each cosmology. For 
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